Tag

#multimodal AI

45 articles

Alibaba's Qwen-Image-3.0 renders full infographic grids and readable ten-pixel text in a single pass

Alibaba's Qwen-Image-3.0 introduces advanced image generation capabilities, including support for 4,500-token prompts, readable ten-pixel text, and complex layout rendering in a single pass.

Jul 217

Alibaba's Qwen takes on Kimi K3 with open-weight Qwen 3.8, says model is "second only to Fable 5"

This explainer explores Alibaba's Qwen 3.8, a multimodal AI model with 2.4 trillion parameters that rivals top-tier models like Fable 5. We examine its architecture, training methods, and implications for the future of large language models.

Jul 1918

Thinking Machines Lab Drops Its First Model

Thinking Machines Lab launches Inkling, a 975-billion-parameter open source model trained to understand video and audio, positioning itself against competitors like Anthropic and OpenAI.

Jul 1528

Thinking Machines Lab Releases Inkling: A 975B-Parameter Open-Weights Multimodal MoE With 41B Active Parameters And Controllable Thinking Effort

Learn how to set up and run inference with the Inkling multimodal AI model from Thinking Machines Lab, including text and image processing with controllable thinking effort.

Jul 1520

Building a VideoAgent-Style Multi-Agent System: Intent Parsing, Graph Planning, and Tool Routing for Video Editing Tasks

Researchers have reconstructed the VideoAgent workflow into a functional, API-key-free multi-agent system for AI-powered video editing, enabling natural language interactions and automated video processing.

Jul 1331

Meta Superintelligence Labs Releases Muse Spark 1.1: A Multimodal Reasoning Model for Agentic Tasks on Meta Model API

Meta Superintelligence Labs introduces Muse Spark 1.1, a multimodal reasoning model for agentic tasks, featuring a 1,000,000-token context window and multi-agent delegation capabilities.

Jul 921

Why the next leap in AI video is teaching avatars to see and listen

The next leap in AI video is not just about improving visual fidelity, but teaching avatars to see, hear, and interact in real time. This shift is transforming how we think about digital experiences.

Jul 241

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

Explains Apple's advanced 'Apple Intelligence' framework, detailing how transformer-based architectures, multimodal processing, and privacy-preserving techniques will revolutionize AI assistants and human-computer interaction.

Jun 664

The latest AI news we announced in May 2026

Google AI announced major advancements in multimodal models, safety measures, and enterprise applications in May 2026. The company's Gemini 2.0 release represents a significant leap in AI capabilities and accessibility.

Jun 549

Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM

Google Deepmind's Gemma 4 12B is an open-source multimodal AI model that runs efficiently on laptops with just 16 GB of RAM, nearly matching the performance of its larger 26B counterpart.

Jun 347

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba's Qwen team launches Qwen3.7-Plus, a multimodal AI model on the Bailian platform, featuring vision understanding, deep reasoning, tool invocation, and autonomous iteration.

Jun 249

MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders

Chinese AI company MiniMax has unveiled M3, the first open-weight model combining top-tier coding performance, a one-million-token context window, and native multimodality, challenging proprietary leaders in the AI space.

Jun 166